其他
直击案发现场!TCP 10倍延迟的真相是?
阿里妹导读:什么是经验?就是遇到问题,解决问题,总结方法。遇到的问题多了,解决的办法多了,经验自然就积累出来了。今天的文章是阿里技术专家蛰剑在工作中遇到的一个问题引发的对TCP性能和发送接收Buffer关系的系列思考(问题:应用通过专线从公司访问阿里云上的服务,专线100M,时延20ms,一个SQL查询了22M数据出现10倍+的信息延迟,不正常。)希望,你也能从中得到启发。
前言
$sudo sysctl -a | egrep "rmem|wmem|adv_win|moderate"
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_rmem = 4096 87380 6291456
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
vm.lowmem_reserve_ratio = 256 256 32
抓包 tcpdump+wireshark
socket.setSendBufferSize(16*1024) //16K send buffer
#!/usr/bin/stap
# Simple probe to detect when a process is waiting for more socket send
# buffer memory. Usually means the process is doing writes larger than the
# socket send buffer size or there is a slow receiver at the other side.
# Increasing the socket's send buffer size might help decrease application
# latencies, but it might also make it worse, so buyer beware.
# Typical output: timestamp in microseconds: procname(pid) event
#
# 1218230114875167: python(17631) blocked on full send buffer
# 1218230114876196: python(17631) recovered from full send buffer
# 1218230114876271: python(17631) blocked on full send buffer
# 1218230114876479: python(17631) recovered from full send buffer
probe kernel.function("sk_stream_wait_memory")
{
printf("%u: %s(%d) blocked on full send buffern",
gettimeofday_us(), execname(), pid())
}
probe kernel.function("sk_stream_wait_memory").return
{
printf("%u: %s(%d) recovered from full send buffern",
gettimeofday_us(), execname(), pid())
}
vm.lowmem_reserve_ratio = 256 256 32
net.core.wmem_max = 1048576
net.core.wmem_default = 124928
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.udp_wmem_min = 4096
net.ipv4.tcp_moderate_rcvbuf
sudo tc qdisc del dev eth0 root netem delay 20ms
sudo tc qdisc add dev eth0 root tbf rate 500kbit latency 50ms burst 15kb
这个案例关于wmem的结论
有一个batch insert语句,整个一次要插入5532条记录,所有记录大小总共是376K。 SO_RCVBUF很小的时候并且rtt很大对性能的影响
request的SQL太大,Server(3306端口上的服务)从TCP读取SQL需要放到一块分配好的内存,内存不够的时候需要扩容,扩容有可能触发fgc,从图形来看,第一次满就卡顿了,而且每次满都卡顿,不像是这个原因 request请求一次发过来的是多个SQL,应用读取SQL后,将SQL分成多个,然后先执行第一个,第一个执行完后返回response,再读取第二个。图形中卡顿前没有response返回,所以也不是这个原因。 ……其它未知原因
应用代码卡顿、GC等等
会轮询读取多个分片,造成这种卡顿的概率小了很多
ss -itmpn dst "10.81.212.8"
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 10.xx.xx.xxx:22 10.yy.yy.yyy:12345 users:(("sshd",pid=1442,fd=3))
skmem:(r0,rb369280,t0,tb87040,f4096,w0,o0,bl0,d92)
Here we can see this socket has Receive Buffer 369280 bytes, and Transmit Buffer 87040 bytes.
Keep in mind the kernel will double any socket buffer allocation for overhead.
So a process asks for 256 KiB buffer with setsockopt(SO_RCVBUF) then it will get 512 KiB buffer
space. This is described on man 7 tcp.
/* TCP initial congestion window as per rfc6928 */
#define TCP_INIT_CWND 10
/* 3. Try to fixup all. It is made immediately after connection enters
* established state.
*/
void tcp_init_buffer_space(struct sock *sk)
{
int tcp_app_win = sock_net(sk)->ipv4.sysctl_tcp_app_win;
struct tcp_sock *tp = tcp_sk(sk);
int maxwin;
if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK))
tcp_sndbuf_expand(sk);
//初始最大接收窗口计算过程
tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss);
tcp_mstamp_refresh(tp);
tp->rcvq_space.time = tp->tcp_mstamp;
tp->rcvq_space.seq = tp->copied_seq;
maxwin = tcp_full_space(sk);
if (tp->window_clamp >= maxwin) {
tp->window_clamp = maxwin;
if (tcp_app_win && maxwin > 4 * tp->advmss)
tp->window_clamp = max(maxwin -
(maxwin >> tcp_app_win),
4 * tp->advmss);
}
/* Force reservation of one segment. */
if (tcp_app_win &&
tp->window_clamp > 2 * tp->advmss &&
tp->window_clamp + tp->advmss > maxwin)
tp->window_clamp = max(2 * tp->advmss, maxwin - tp->advmss);
tp->rcv_ssthresh = min(tp->rcv_ssthresh, tp->window_clamp);
tp->snd_cwnd_stamp = tcp_jiffies32;
}
net.ipv4.tcp_adv_win_scale = 2 //2.6内核,3.1中这个值默认是1。
static inline int tcp_win_from_space(const struct sock *sk, int space)
{//space 传入的时候就已经是 2*SO_RCVBUF了
int tcp_adv_win_scale = sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale;
return tcp_adv_win_scale <= 0 ?
(space>>(-tcp_adv_win_scale)) :
space - (space>>tcp_adv_win_scale); //sysctl参数tcp_adv_win_scale
}
一般来说绝对不要在程序中手工设置SO_SNDBUF和SO_RCVBUF,内核自动调整比你做的要好; SO_SNDBUF一般会比发送滑动窗口要大,因为发送出去并且ack了的才能从SO_SNDBUF中释放; TCP接收窗口跟SO_RCVBUF关系很复杂; SO_RCVBUF太小并且rtt很大的时候会严重影响性能; 接收窗口比发送窗口复杂多了; 发送窗口/SO_SNDBUF--发送仓库,带宽/拥塞窗口--马路通畅程度,接收窗口/SO_RCVBUF--接收仓库; 发送仓库、马路通畅程度、接收仓库一起决定了传输速度--类比一下快递过程。
想要拿阿里Java研发岗offer,成为我的同事吗?
识别下方二维码或点击“阅读原文”看看阿里巴巴Java研发工程师的5面经验。
你可能还喜欢
点击下方图片即可阅读
Go语言出现后,Java还是最佳选择吗?
KPI过时了?为什么科技公司更偏爱OKR?
关注「阿里技术」
把握前沿技术脉搏